A Profiling Tool for Detecting Cache-Critical Data Structures
نویسندگان
چکیده
A poor cache behavior can significantly prohibit achieving high speedup and scalability of parallel applications. This means optimizing a program with respect to cache locality can potentially introduce considerable performance gain. As a consequence, programmers usually perform cache locality optimization for acquiring the expected performance of their applications. Within this work, we developed a data profiling tool dprof with the goal of supporting the users in this task by allowing them to detect the optimization targets in their programs. In contrast to similar tools which mostly focus on code regions, we address data structures because they are the direct objects that programmers have to work with. Based on the Performance Monitoring Unit (PMU) provided by modern processors, dprof is capable of finding cache-critical variables, arrays, or even a segment of an array. It can also locate theses access hotspots to the most concrete position such as individual functions and code lines. This feature allows the user to apply dprof for efficient cache optimization.
منابع مشابه
Locating Cache Performance Bottlenecks Using Data Profiling Citation
Effective use of CPU data caches is critical to good performance, but poor cache use patterns are often hard to spot using existing execution profiling tools. Typical profilers attribute costs to specific code locations. The costs due to frequent cache misses on a given piece of data, however, may be spread over instructions throughout the application. The resulting individually small costs at ...
متن کاملMulti-Cache Profiling of Parallel Processing Programs Using Simics
This paper presents a multi-cache profiler for shared memory multiprocessor systems. For each program’s static data structure, the profiler outputs the readand write-miss frequencies that are due to cache line migrations. Those program’s static data structures, which their manipulations, result in excessive cache line migrations—potentially a source for excessive falsemisses—are identified. The...
متن کاملRefactoring Intermediately Executed Code to Reduce Cache Capacity Misses
The growing memory wall requires that more attention is given to the data cache behavior of programs. In this paper, attention is given to the capacity misses i.e. the misses that occur because the cache size is smaller than the data footprint between the use and the reuse of the same data. The data footprint is measured with the reuse distance metric, by counting the distinct memory locations ...
متن کاملData Locality Analysis of the SPECfp 95
This paper presents a detailed analysis of the locality exhibited by the SPECfp95 benchmark suite. This study is performed by means of a tool that is based on a static analysis enhanced by a simple profiling. This new approach results in a fast, accurate and flexible data locality analysis tool. It is fast because its run-time overhead is almost negligible , having a slowdown of 0.05. Besides, ...
متن کاملUnobtrusive Reactive Prefetching: A Multicore Approach for Exploiting Hot Streams in Cache Misses
Processor performance continues to outpace memory performance by a large margin. One approach for mitigating this gap is to employ software-based speculative prefetching. Software dynamic prefetchers are able to identify patterns more complex than those of hardware prefetchers while retaining the ability to respond to a programs dynamic behavior; however modern techniques incur prohibitively hi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007